Co-action equilibria and strategy switchings in a stochastic minority game 
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We propose a variation of the minority game in which A'' rational agents use probabilistic strate- 
gies. The absence of quenched disorder in the model makes it exactly soluble in terms of simple 
exact mean-field like self -consistent equations for the optimal probability assignments to actions. 
We propose an alternative to the standard Nash equilibrium, called co-action equilibrium, which 
gives higher expected payoff for all agents. Parameters of the optimal strategy depend on the future 
time-horizon of agents, parameterized by a real variable A, and are non-analytic functions of A, even 
for a finite number of agents. The solution for A'' < 7 is worked out explicitly. 
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There has been a lot of interest in applying tech- 
niques of statistical physics to economics in the past two 
decades, and this has led to better modeling and analy- 
sis of the collective behavior of interacting agents, as in 
a market. The prototypical model in this subject is the 
Minority Game (MG), introduced by Challet and Zhang 
1 1 , which drew inspiration from the El Farol Bar problem 
^. The model consists of agents, who have to repeatedly 
make choices between two alternatives, and at each step 
the agents who belong to the minority group are consid- 
ered as winners. In MG, each agent is affected only by 
the total behavior of all other agents, and this model gen- 
erated a lot of interest, in the expectation that the model 
could be solved exactly, using mean-field techniques, well- 
known in statistical physics Q. Unfortunately, this has 
not been possible so far, due to the existence of quenched 
disorder in the problem, in the form of a random basket 
of strategies assigned to each agent at the beginning of 
the game. 

In this paper, we propose a variation of the minor- 
ity game without quenched disorder, with focus on the 
case of finite number N of agents, who use stochastic 
strategies. Unlike the original MG, the agents are fully 
rational, and not adaptive. The absence of quenched dis- 
order in our model makes it much more tractable than 
the original MG. And the use of stochastic, instead of the 
determininstic, strategies makes it more efficient. The 
self-consistency requirement of optimal strategies gives 
us coupled algebraic equations in N variables, which are 
the equivalent of exact mean-field equations in this prob- 
lem. Interestingly, the non-equlibrium steady state shows 
a non-analytic dependence on a control parameter even 
for finite number of agents. The simplicity of analysis of 
our model makes it an interesting model of interacting 
competing agents. 

In addition, we propose a new solution concept, as an 
alternative to the usual notion of Nash equilibrium 
We show that the Nash equilibrium concept here is very 
unsatisfactory, giving rise to 'trapping states' ( discussed 
below), and our proposed alternative, to be called co- 
action equilibrium, avoids this problem. 



In a standard MG setting, each of the agents, with 
N odd, has to choose between two alternatives (say two 
restaurants A and B) each day, and those in the restau- 
rant with fewer people get a payoff 1, and others 0. The 
agents cannot communicate with each other, and make 
their choice based only on the information of the record 
of minority restaurant in the last m days. Each agent 
has a small number of strategies available with him/her. 
An agent decides which restaurant to go to, using the 
strategy that performed best in the recent past, based on 
some rule that evaluates their past performance. Thus 
the time-evolution is deterministic, but there is quenched 
randomness in the strategies assigned to agents. The gen- 
eral qualitative behavior of the model is quite well under- 
stood from simulations. The problem has been solved ex- 
actly only in the limit of large N and m, with 2"^/N = a 
held fixed, and only for a greater than a critical value ac, 
using concepts and formalism developed originally for the 
spin-glass problem [H, . For a more detailed discussion, 
and a review of earlier work see [H-l^ . 

In our model, the common information is more de- 
tailed: each agent knows the time-series {'n{t)} of how 
many people went to A on different days in the past. 
We assume that agents are selfish, and rational, and any 
agent X only wants to optimize her weighted expected 
future payoff. 



ExpPayoff(X) = ^[(1 - A)A-](M/x(r)) 



(1) 



r=0 



where (Wjc (r)) is the expected payoff of the agent X on 
the r-th day ahead, and A is a parameter < A < 1. It 
is called the discount parameter in Economics literature. 
We assume that how far ahead into the future an agent 
looks is the same for all agents, and they use same value 
of the parameter A. 

In our model, the agents use a stochastic strategy to 
choose the restaurant. An important feature of MG is 
the fact that agents self-organize into a state of high so- 
cial efficiency. Probabilistic strategies perform better in 
this, as scoring methods used to select the strategy used 
in deterministic MG are not very effective. This is also 
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seen from the well-known observation that average per- 
formance of an agent using the least-scoring strategy can 
be better than that of peers using highest scores [9|. 

In our formulation, each agent selects a probability p, 
and then generates a new random number uniformly be- 
tween and 1 , and switches her choice from the previous 
day if it is < p. The choice of p depends on the history 
of the system, and her own history of payoffs in the last 
m days, and constitutes the strategy of the agent. We 
restrict our discussion here to the simplest case, where 
TO = 1 for all agents. Then, the agent's strategy is solely 
determined by the number of people who were in the 
restaurant she went to the previous day. This model was 
first defined in [lol | . A somewhat similar model was stud- 
ied earlier by Reents et al 
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We say that an agent is in state \i) when she is in a 
restaurant with total number of people i. Let pi be the 
probability chosen by an agent when she is in state \i). 
For a given N, a strategy P is defined by the set of N 
numbers P = {pi,p2, ....pn}- Clearly, the system under- 
goes a Markovian evolution, described by a master equa- 
tion. As each agent can be in one of two states, the state 
space is 2^ dimensional. However, we use the symmetry 
under permutation of agents to reduce the Markov tran- 
sition matrix to N x N dimensional. Let \Prob{t)) be an 
A^-dimensional vector, whose j-th element is Probj(t), 
the probability that a marked agent X finds herself in 
state Ij) on the <-th day. On the next day, each agent 
will switch with known probabilities, and we get 



\Prob{t + 1)) = T\Prob{t)) 



(2) 



where T is the N x N Markov transition matrix. Explicit 
matrix elements are easy to write down. 

The total expected payoff of X, given that she is in 
state \j) at time t = is easily seen to be 



(1 - A) L 



T 



1 - AT 



where \j) is the vector with only the j-th element 1, and 
rest zero; and {L\ is the left- vector (1, 1, 1, 1, ..0, 0, 0..|, 
with first M ~ {N — l)/2 elements 1 and rest zero. 

The most commonly used notion in studying A'^-person 
games is that of Nash equilibrium: A state of the system 
in which agent i uses a strategy Si is a Nash equilibrium, 
if for all i, Si is the best response of i, assuming that all 
agents j i use the strategy Sj . 

Consider, for simplicity, the case A = 0, where agents 
optimize only next day's payoff. Then, the state |A/) of 
the MG with pM = Pm+i = is a Nash equilibrium, 
as no agent can gain by switching, if other agents stay 
put. However, then, the next day the state remains the 
same. Thus we get a frozen steady state that maximizes 
the number of happy people, but is very unsatisfactory 
for the majority of agents, as they are on the losing side 
for all future days. 



In fact, in the Nash equilibrium concept, an agent in 
the state \M + 1), with pM ~ 0, is advised that her 
best strategy is to set pm+i = 0. If other agents in the 
restaurant switch with probability pm+i =^ 0, this is the 
'optimal' solution. This does not take into account the 
fact that if all agents follow this advice, their expected 
future gain is zero, which is clearly unsatisfactory: No 
other advice could do worse! 

The problem with the analysis lies in the Nash assump- 
tion of optimizing over strategies of one agent, assuming 
other agents would do as before. In the alternate co- 
action equilibrium concept proposed here, the agent in a 
state \i) realizes that she can choose her switching prob- 
ability Pi, but all the other fully rational {i — I) agents 
in the same restaurant, with the same information avail- 
able, would argue similarly, and choose the same value 
of Pi. Determining the optimal value of pi that maxi- 
mizes Wi does not need communication between agents. 
But the stability condition becomes the condition that 
the expected weighted gain Wi does not increase if pi is 
changed to a different value. 

One can think of co-action equilibrium on any given 
day as a two-person game, where the two persons are 
the majority and the minority groups, and they select 
the optimal values of their strategy parameters Pi and 
PN-i- But these groupings are temporary, and change 
with time. In our model, the complete symmetry be- 
tween the agents, and the assumption of their being fully 
rational, ensures that they will reach co-action equilib- 
rium. We do not discuss here if such equlibria are actu- 
ally achieved in real life situations. 

We now discuss the equilibrium choice {p\,P2^ ■ ■ ■P*n}- 
The co-action equilibrium condition implies conditions 
on the N parameters {p*}. There can be more than 
one self-consistent solution to the equations, and each 
solution corresponds to a possible steady state. 

One simple choice is that p* = 1/2 for all i, which is the 
random state, where each agent just picks a restaurant 
totally randomly each day, independent of history. We 
will denote this strategy by Prand- In the corresponding 
steady state, it is easy to see that Wj is independent of 
j, and given by 



N - 1 
M 



for all j. (3) 



For any strategy P, we define an inefficiency parameter 
rj as 



{W^ax - Wa.g (P)) / (W,„a. - W^rand) (4) 



where Wmax = M/N is the maximum possible payoff 
per agent, Wavg (P) is the average payoff per agent in the 
steady state for a given A. 

By the symmetry of the problem, it is clear that we 
must have = 1/2 for all A. Now consider the strategy 
{p*} = [pI, 1/2, 1/2,1/2...}. If X is in state |1), and next 
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FIG. 1: iV = 3: (a) Variation of pa with A, and (b) The 
optimum payoffs Wi, (i = 1 to 3), as functions of A. 

day all other agents will switch with probability 1/2, it 
docs not matter if X switches or not: payoffs Wi and 
Wn^i arc independent of p*. Hence pj can be chosen 
to be of any value. In it is shown that the strategy 
^'rand^ in which pI = 0, and p*j^_i < 1/2, chosen to 
maximize Wn-i, is better for all agents, and hence is 
always preferred over Prand- 

As a simple application, consider first the case iV = 3. 
Since pi = 0, P3 = 1/2, the only free parameter is 
This is determined by maximizing W2(A) with respect to 
P2, and this gives the optimal strategy for any A. It is 
found that pi, monotonically decreases with A from its 
value 1/2 at A = as shown in Fig. [TJi. The payoff 
of agents in various states with this optimum strategy is 
shown in Fig. \Tjp. 

It is easily verified that the average payoff per agent per 
day Wavg in the steady state for = 3 is a monotonically 
increasing function of A, and leads to the best possible 
solution Wm.n.x = 1/3 as A -> 1. Some more details may 
be found in |12| . 

N=5: We can similarly determine the optimal strat- 
egy for N = 5. This is characterized by the five param- 
eters {pI,P2,P3,pI,pI). The simplest strategy is Vrand, 
which corresponds to p* = 1/2, for all i. As explained 
above, the strategy P;^,^ = (0, 1/2, l/2,p|(A), 1/2), 
gives higher payoff than Fraud for all agents, for all A. 

Consider agents in states |2) and |3). What values oip2 
and P3 they would select, given their expectation/belief 
about the selected values of pi, p4 and p^ ? Let the 
expected gain of agents in state |2) or |3) under F'^^nd 
W. We can study the variation of payoffs W2 and W3 as 
functions of p2 and p^ for fixed values of Pi,P4,P5 and A. 

Let us denote the best response of agents in state |2), 
(that maximizes W2 ) , if the agents in the opposite restau- 
rant jump with probability p^ by P2*(P3)- Similarly, 
P3*{P2) denotes the best response of agents in state |3), 
when those in the opposite restaurant jump with proba- 
bility P2- 

In Fig. [21 we plot the functions P2*{P3) and P3*(p2), 
in the (^3,^2) plane, for three different representative 
values of A. For small ps, P2*(P3) remains zero, and its 
graph sticks to x-axis initially, ( segment OA in figure), 
and then increases monotonically with p^. The strategy 
Kand is the point (1/2, 1/2), denoted by P. We also show 



the hues PC where W3 = W, and PD, where W2 = W. 
For all points in the curvilinear triangle PCD, both W2 
and W3 > W . Clearly, possible equilibrium points are 
the points lying on the lines P2*(p3), or p3*(p2) that lie 
within the curvilinear triangle PCD. 

For small A ( shown in Fig. Eli for A = 0.1), The 
point A is to the left of C, and the only possible self- 
consistent equilibrium point is P. This implies that the 
agents would choose P2 = P3 = 1/2. This situation con- 
tinues for aU A < Aci = .195 ± .001. 

For A > Aci, the point A is to the right of C. This 
is shown in Fig. [JJd, for A = 0.4. In this case, possible 
equilibrium points lie on the lie-segment CA, and out of 
these, A will be chosen by agents in state |3). At A, both 
W2 and W3 are greater than W, and hence this would 
be preferred by all. Further optimization of pi changes 
P3 and Pi only slightly. The resulting fixed-point self- 
consistent values of {p*}, and the corresponding payoff 
functions are shown in Fig. |3K and ISJd respectively. 

As we increase A further, for A > Ac2 [numerically, 
Ac2 = 0.737 ± .001], the point B comes to the left of A. 
Out of possible equilibria liying on the line-segment CA, 
the point preferred by agents in state |3) is no longer 
A, but B. The self-consistent values of P2, P3, and p| 
satisfying these conditions and the corresponding payoffs 
are also shown in Fig. 3. 

The transition at Ad involves a discontinuous switch 
between strategies, and hence many steady state quan- 
tities also show discontinuous jumps. However, for the 
transition at A = Ac2, the different pi's change continu- 
ously. In Fig. |3j:, we have plotted the inefficiency param- 
eter 77 as a function of A. Interestingly, we see that in the 
range Ad < A < Ac2; the inefficiency rises as the agents 
optimize for farther into future. This may appear sur- 
prizing, as certainly, the agents could have used strategies 
corresponding to lower A. This happens because though 
the state for larger A is slightly less efficient overall, in 
it the majority benefits more, as the difference between 
W2 and W3* is decreased substantially ( Fig. [3]3). 

Thus, we have shown that the optimal strategies, and 
hence the (non-equilibrium) steady state of the system 
shows a non-analytic dependence on A, for N = 5. For 
higher values of N, the analysis is similar. For the case 
N = 7, we find that there are four thresholds Xd, with 
2 = 1 to 4. For A < Ad , the optimal strategy has the form 
(0, 1/2, 1/2, 1/2, l/2,p*, 1/2). For Ad < A < Ac2, we get 
P3 = 0, and pI < 1/2. For still higher values Ac2 < A < 
Ac3, agents in state |2) and |5) also find it better to switch 
to win-stay-lose-shift strategy, and we get P2 = 0, pg < 
1/2. The transitions at Ac3 and Xd are similar to the 
second transition for N ~ 5, where points A and B cross 
each other in the (p4,P3) and (p5,P2) planes respectively. 
Numerically, we find Ad « .465, Ac2 ~ .635, Ac3 ~ .83 
and Ac4 « .95. Some more discussion of the case N = 7 
may be found in [l^ . 

Summary and concluding remarks: In this pa- 
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FIG. 2: Region in the P2-P3 plane showing the best responses P2*{p3) (blue) and P3*(p2) (red) for agents in state |2) and 
|3) respectively, for (a) A = .1, (b) A = .4 and (c) A = .8. The line PC and PD show the curves 103 = W' and W2 = W 
respectively. In the curvilinear triangle PCD, all agents do at least as well as at P. 
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FIG. 3: N = 5: (a) Variation of P2, p*j and pX with A , (b) Optimum payoffs as functions of A, (c) InefHciency 77 as a function 
of A. 



per, we have analysed a stochastic variant of the minority 
game, where the N agents are equal (no quenched ran- 
domness in strategies given to agents), and this permits 
an exact solution in terms of N self-consistently deter- 
mined parameters. It shows multiple sharp transitions 
as a function of the discount parameter A, even for fi- 
nite N. Also, the optimal strategy is more efficient than 
possible under the deterministic MG. 

We note that the original MG was proposed as a model 
of learning, adaptation and co-evolution. However, in our 
model, all agents realize the best strategy, and while the 
actual fortunes of agents fluctuate in time, the optimal 
strategy parameters {p*} remain unchanged. 

Our treatment of the model here differs from that in 
[lo| : in that paper, the game was discussed only for 
A = 0, and in terms of Nash equilibrium. Within the 
Nash solution concept, it was not clear how to avoid the 
problem of trapping states, and we had made an ad hoc 
assumption that whenever the system reaches a trapping 
state, a major resetting event occurs. In the co-action 
equilibrium concept proposed here, the decision to switch 
to p ^ 1/2 or not, is made rationally by the agents them- 
selves. 

Generalizations of the model for to > 1, or when all 
agents are not identical etc. are easy to define, and ap- 
pear to be interesting subjects for further study. The 
technique may be used to study other games with differ- 



ent payoff functions, e.g. agents win when their restau- 
rant has attendance exactly r. The case of large TV will 
be discussed in a future publication [isj . 

We thank P. Grassberger, M. Marsili and J. Stilck for 
their comments on an earlier version of this paper. 
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SUPPLEMENTARY MATERIAL 

iV = 3 

The case with the number of agents A'^ = 3 is quite 
straight forward. As explained in the last section, pi ~ 
and = 1/2 for all A and P2 is determined by maximiz- 
ing W2. Also Wmax = 1/3 and Wrand = 1/4. Exphcit 
expression for W2 can be easily written down using the 
transfer matrix. The transfer matrix is, 



T : 



P2q2 1/4 

2p2'72 92 1/2 

pI pI 1/4 



(5) 



where 92 = 1 — P2- The eigenvalues and eigen vectors 
of the transfer matrix are easily determined. Eigenvalues 

are, I 1, 7 (l - 4^2) , 92 (92 - P2) ) • The right eigen vec- 



4p2 



2J' 



tors corresponding to these eigenvalues are [l, 2, 
[1, 2, —3], [1, —1, 0] and the left eigenvectors are 
[1, 1, 1], [Apl Apl -3], [2, -1, 0] respectively. Nor- 
malizing the eigenvector corresponding to eigenvalue 1 
gives the average gain in the steady state Wavg as 

1 



W, 



avg 



3 + 4(p^)2 



The payoff W2 is given by 
W2 = {1- A) [1 0] 



T 



(6) 



(1 - AT) 
4^292 + Ap2('72 -P2) 



(-1 + Ag2(g2-P2))(4 + A(4p2_i)) 
The equation determining P2 as a function of A is: 



16 - 32p; - (24 - 56p2 + Z2pl^)\ + (9 - 28^^ -f- AOpf 
- 96pf + lUp*^ - 6Ap*')X^ - (1 - 4p; + 8p^^ - 24^2 



+3 



- 32p;^)A^ = 

It is easy to verify that, for A = 1 — e, the average 
payoff per agent per day in the steady state is given by 



1/3- Ke^/^ + 0{e) 



where if is a numerical constant. 

With the (0,P2:l/2) strategy, the inefficiency ry as a 
function of A is shown in Fig. |4l As we can see rj mono- 
tonically decreases with A tending to as A 1. There- 
fore the inefficiency is minimum or the resource utiliza- 
tion is best when the agents are optimizing for their long 
term gains. 



Proof that 



is better than 



i, for all agents 



(7) 



Consider the strategy {p*} = {p^, 1/2, 1/2, 1/2...}. If 
X is in state |1), and next day all other agents will switch 
with probability 1/2, using the symmetry between the 
two restaurants, clearly it does not matter if X switches 
or not: payoffs Wi and Wn-i are independent of p*. 
Hence p* can be chosen to be of any value. 

Here the transition matrix may be simplified to a 3 x 3 
matrix, involving probabilities of the states of the marked 
agent X: states |1), — 1), and a state corresponding 
all other pssiblities, denoted by \else). It is then straight 
forward to write down an explicit expression for Wn-i, 
and the optimal value is found to be p*j^_i{X) < 1/2, for 
all A > 0. We call this strategy Kand- 

One can study the payoffs Wi and Wjy-i in the 
{pN-i,Pi) plane, as in Fig. 2, of the paper. Clearly, 
here the line of constant Wi = Wrand is the vertical line 
Pn-i = 1/2. Then, the point (p^_j^,0) corresponds to 
point B of Fig 2(c). 

Clearly, at B, both Wi and Wm-i are greater than 
Wrand- If thc agcnt is in the \else) state, either they 
remain in same state, or make transitions to |1) or jiV— 1), 
where they would on the average do better than Wrand- 
Hence thc expected payoff when the agent is in the state 
\else) cannot be worse than Wrand- 

Thus ^'rand better than fraud for all agents. 



Higher 

We also note that using the symmetry under permu- 
tation of agents, we can block diagonalize the transfer 
matrix T into two blocks of size M and M + 1. This is 
achieved by a change of basis, from vectors \i) and \N — i) 
to the basis vectors \si) and joi), where 



\s,)^\i) + \N-t) 

\a,) = {N -i)\i)-i\N -i). 

N = 7 

We present sonic graphs for the solution for N = 7. 
Fig. [5] shows variation of the optimum switch proba- 
bilities in various states and [5] shows the variation of the 
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optimum payoffs. Fig. [7] shows the variation of ineffi- 
ciency with A. 

An interesting consequence of the symmetry between 
the two restaurant is the fohowing: If there is a solu- 
tion {p*} of the self-consistent equations, another solu- 
tion with all payoffs unchanged can be obtained by choos- 
ing any j, a solution {pf}, given by p*' = 1 — p*, and 



P*N-j' = 1 - PN-j, and p- = Pi, for i ^ j or {N ~ j). 
How agents choose between these symmetry related 2^ 
equilibria can only be decided by local conventions. 




FIG. 5: N = 7: Variation of optimum switch probabilities with A. 
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